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Abstract 

Models for distributions of shapes contained within images can be widely used in biomedical applications 
ranging from tumor tracking for targeted radiation therapy to classifying cells in a blood sample. Our focus 
is on hierarchical probability models for the shape and size of simply connected 2D closed curves, avoiding 
the need to specify landmarks through modeling the entire curve while borrowing information across curves 
for related objects. Prevalent approaches follow a fundamentally different strategy in providing an initial 
point estimate of the curve and/or locations of landmarks, which are then fed into subsequent statistical 
analyses. Such two-stage methods ignore uncertainty in the first stage, and do not allow borrowing of 
information across objects in estimating object shapes and sizes. Our fully Bayesian hierarchical model is 
based on multiscale deformations within a linear combination of cyclic basis characterization, which facilitates 
automatic alignment of the different curves accounting for uncertainty. The characterization is shown to be 
highly flexible in representing 2D closed curves, leading to a nonparametric Bayesian prior with large support. 
Efficient Markov chain Monte Carlo methods are developed for simultaneous analysis of many objects. The 
methods are evaluated through simulation examples and applied to yeast cell imaging data. 

Keywords: Bayesian nonparametrics, cyclic basis, deformation, hierarchical modeling, image cytometry, 
multiscale, 2d shapes 



I. INTRODUCTION 

Collections of shapes are widely studied across many disciplines, such as biomedical imaging, cytology and 
computer vision. Perhaps the most fundamental issue when studying shape is the choice of representation. 
The simplest representations for shape are basic geometric objects, such as ellipses (Cinquin, Chalmond and 
Berard 1982; Amenta, Bern and Kamvysselis 1998; Rossi and Willsky 2003), polygons (Malladi, Sethian and 
Vemuri 1994; Malladi, Sethian and Vemuri 1995; Sederberg, Gao, Wang and Mu 1993; Sato, Wheeler and 
Ikeuchi 1997), and slightly more involved specifications such as superellipsoids (Gong, Pathak, Haynor, Cho 
and Kim 2004). 

Clearly, not all shapes can be adequately characterized by simple geometric objects. The landmark- 
based approach was developed to describe more complex shapes by reducing them to a finite set of landmark 
coordinates. This is appealing because the joint distribution of these landmarks is tractable to analyze, and 
because landmarks make registration/alignment of different shapes straightforward. There is a very rich 
statistical literature on parametric joint distributions for multiple landmarks (Bookstein 1986; Bookstein 
1996c; Bookstein 19966; Bookstein 1996a; Dryden and Mardia 1998; Dryden and Mardia 1993; Mardia and 
Dryden 1989; Dryden and Gattone 2001; Zheng, John, Liao, Boese, Kirschstein, Georgescu, Zhou, Kempfcrt, 
Walther, Brockmann et al. 2010), with some recent work on nonparametric distributions, both frequentist 
(Kume, Dryden and Le 2007; Kent, Mardia, Morris and Aykroyd 2001; Bhattacharya 2008; Bhattacharya 
and Bhattacharya 2009) and Bayesian (Bhattacharya and Dunson 20116; Bhattacharya and Dunson 2010; 
Bhattacharya and Dunson 2011a). 

Unfortunately, in many applications it is not possible to define landmarks if the target collection of 
objects vary greatly. Furthermore, even if landmarks can be chosen, there may be substantial uncertainty 
in estimating their location, which is not accounted for in landmark-based statistical analyses. 

In these situations, one can instead characterize shapes by describing their boundary, using a nonpara- 
metric curve (2D) or surface (3D). Curves and surfaces are widely used in biomedical imaging and commercial 
computer-aided design (Barnhill 1985; Lang and Roschel 1992; Hagen and Santarclli 1992; Aziz, Bata and 
Bhat 2002), because they provide a flexible model for a broad range of objects e.g., cells, pollen grains, 
protein molecules, machine parts, etc. A collection of introductory work on curve and surface modeling 
can be found in Su and Liu (1989) and subsequent developments in Muller (2005). Popular representations 
include: Bezier curves, splines, and principal curves (Hastie and Stuetzle 1989) (a nonlinear generalization 
of principal components, involving smooth curves which 'pass through the middle' of a data cloud). Kurtck, 
Srivastava, Klassen and Ding (2011) and Su, Dryden, Klassen, Le and Srivastava (2011) dealt with curve 
modeling based on smooth stochastic processes. Although there is a hugely vast literature on estimating 
curves and surfaces, most of the focus is on estimating [i : X — > E, where X a compact subset of W without 
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making any constraints on fi. Estimating a closed surface or a curve involves a different modeling strat- 
egy and there has been very few works in this regime, particularly from a Bayesian point of view. To our 
knowledge, only Pati and Dunson (2011) developed a Bayesian approach for fitting a closed surface using 
tensor-products. 

Many of the above curve representations can successfully fit and describe complex shape boundaries, 
but they often have high or infinite dimensionality, and it is not clear how to directly analyze them. Also, 
they were not designed to facilitate comparison between shapes or characterize a collection of shapes. One 
solution is to re-express each curve using Fourier descriptors or wavelet descriptors (Whitney 1937; Zahn and 
Roskies 1972; Mortenson 1985; Persoon and Fu 1977). Both approaches decompose a curve into components 
of different scales, so that the coarsest scale components carry the global approximation information while 
the finer scale components contain the local detailed information. Such multiscale transforms make it easier 
to compare objects that share the same coarse shape, but differ on finer details, or vice versa. The finer scale 
components can also be discarded to yield a finite and low-dimensional representation. Other dimensionality- 
reducing transformations include Principal Component Analysis and Distance Weighted Discrimination. 

Note that the entire process is fragmented into three separate tasks: 1) curve fitting, 2) transformation, 
3) population-level analysis. This can be problematic for several reasons. First, curve-fitting is not always 
accurate. If uncertainty is not accounted for, mistakes made during curve-fitting will be propagated into later 
analyses. Second, dimension-reducing transformations may throw away some of the information captured by 
curve-fitting. Finally, one suspects that the curve-fitting and transformation steps should be able to benefit 
from higher-level observations made during subsequent population analysis. For example, if the curve-fitting 
procedure is struggling to fit a missing or noisy shape boundary, it should be able to draw on similar shapes 
in the population to achieve a more informed fit. In this paper, we propose a Bayesian hierarchical model 
for 2D shapes, which addresses all of the aforementioned problems by performing curve fitting, multiscale 
transformation, and population analysis simultaneously within a single joint model. 

The key innovation in our shape model is a shape-generating random process which can produce the whole 
range of simply-connected 2D shapes (shapes which contain no holes), by applying a sequence of multiscale 
deformations to a novel type of closed curve based on the work of Roth, Juhasz, Schicho and Hoffmann 
(2009). Mokhtarian and Mackworth (1992), Desideri and Janka (2004) and Dcsideri, Abou El Majd and 
Janka (2007) also proposed multiscale curves (with the latter two being more similar to our work, in their 
usage of Bezier curves and degree-elevation). However, none of these developed a statistical model around 
their representation or considered a collection of shapes. In analyzing a population of shapes, a notion of 
average shape or mean shape is quite important. Dryden and Mardia (1998) discussed notions of mean shape 
and shape variability and various methods of estimating them pertaining to landmark based analysis. We 
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will follow a different but related strategy for defining the average shape in terms of the basis coefficients 



or the control points of the Bezier curves. We call it the 'central shape'. Refer to {2.5 for details. To 
characterize shape variability, we also define a notion of shape quantile in §2.5| 

In <|2j we describe the shape-generating random process, how it specifies a multiscale probability distri- 
bution over shapes, and how this can be used to express various modeling assumptions, such as symmetry. 
In fj3j we provide theory regarding the flexibility of our model (support of the prior) . In £j4] and <|5j we 
show how the random process can be used to fit a curve to a point cloud or an image. In Sj6j we show how 
to simultaneously fit and characterize a collection of shapes, which also naturally incorporates inter-shape 
alignment. In SjTJ we describe the computational details of Bayesian inference behind each of the tasks de- 
scribed earlier. This results in a fast approximate algorithm which is scalable to a huge collection of shapes 
having a dense point cloud each. Finally, in fJH] and <j9j we test our model on simulated shapes and real 
image data respectively. En route, we solve several important sub-problems that may be generally useful in 
the study of curve and surface fitting. First, we develop a model-based approach for parameterizing point 
cloud data. Second, we show how fully Bayesian joint modeling can be used to incorporate several pieces 
of auxiliary information in the process of curve-fitting, such as when each point within a point cloud also 
reports a surface orientation. Lastly, the concept of multi-scale deformation can be generalized to 3d surfaces 
in a straightforward manner. 

2. PRIORS FOR MULTISCALE CLOSED CURVES 

2.1 Overview 

Our random shape generation process starts with a closed curve and performs a sequence of multiscale 



deformations to generate a final shape. In {2.2 we introduce the Roth curve developed by Roth et al. 



(2009), which is used to represent the shape boundary. Then, in ( 2.3 we demonstrate how to deform a Roth 
curve at multiple scales to produce any simply-connected shape. Using the mechanisms developed in |2.2| 
and §2.3[ we present the full random shape process in §2.5| 



2.2 Roth curve 

A Roth curve is a closed parametric curve, C : [— tt, tt] — > M 2 , defined by a set of 2n + 1 control points 
{cj , j — 1, . . . , 2n + 1}, where n is the degree of the curve and we may choose it to be any positive integer, 
depending on how many control points are desired. For convenience, we will refer to the total number of 
control points as J, where J(n) = 2n + 1. For notational simplicity, we will drop the dependence of n in 
J {n). As a function of t, the curve can be viewed as the trajectory of a particle over time. At every time 
t, the particle's location is defined as some convex combination of all control points. The weight accorded 
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Figure 1: Roth basis 



to each control point in this convex combination varies with time according to a set of basis functions, 

{Bf(t),j = 1, . . . , J}, where Bf(t) > and B jW = 1 for a11 L 

J 

C(t) = £>i?;(t), * e [-*>*] > (!) 

Lf / 27r(j-l)M" (2"n!) 2 



2" \ V 2n + l /J ' " (2n + l)! ' 

where Cj = [cj ja! c^y]' specifies the location of the j th control point and 23™ : [— ir, n] —> [0, 1] is the j th basis 
function. For simplicity, we omit the superscript n denoting a basis function's degree, unless it requires 
special attention. This representation is a type of Bezier curve. Refer to Figure [l] for an illustration of the 
Roth basis functions. The Roth curve has several appealing properties: 

1. It is fully defined by a finite set of control points, despite being an infinite dimensional curve. 

2. It is always closed, i.e. C(— tt) = C(ir). This is necessary to represent the boundary of a shape. 

3. All basis functions are nonlinear translates of each other, and are evenly spaced over the interval 
[— 7r, 7r]. They can be cyclically permuted without altering the curve. This implies that each control 
point exerts the same 'influence' over the curve. The influence of the control points is illustrated in 

& 

4. A degree 1 Roth curve having 3 control points is always a circle or ellipse. 

5. Any closed curve can be approximated arbitrarily well by a Roth curve, for some large degree n. This 
is because the Roth basis, for a given n, spans the vector space of trigonometric polynomials of degree 
n and as n — > oo, the basis functions span the vector space of Fourier series. We elaborate on this in 




Figure 2: Deformation of a Roth curve 

6. Roth curves are infinitely smooth in the sense that they are infinitely differentiable (C°°). 
2.3 Deforming a Roth curve 

A Roth curve can be deformed simply by translating some of its control points. We now formally define 
deformation and illustrate it in Figure [2j 

Definition Suppose we are given two Roth curves, 

J J 
C(t) = Y,c j B j (t) ) C{t)=Y^Bj{t\ (3) 

where for each j, Cj = Cj + Rjdj, dj £ K 2 and Rj is a rotation matrix. Then, we say that C{t) is deformed 
into C(t) by the deformation vectors {dj,j = 1 , . . . , J}. 

Each Rj orients the deformation vector dj relative to the original curve's surface. As a result, positive 
values for the y-componcnt of dj always correspond to outward deformation, negative values always corre- 
spond to inward deformation, and dj's x-component corresponds to deformation parallel to the surface. We 
will call Rj a deformation- orienting matrix. In precise terms, 

Rj = 

where 9j is the angle of the curve's tangent line at qj = ~ 2 2n+i ' ^ e P om t where the control point Cj has 
the strongest influence: qj = arg max Bj(t). 9j can be obtained by computing the first-derivative of the 

t£[a,b] 

Roth curve, also known as its hodograph. 



cos(6>j) — sin(0j-) 



sin(#j) cos(9j) 



(4) 



6 



Definition The hodograph of a Roth curve is given by: 

Hit) = X>^(*)> ( 5 ) 
i=i 

, . 2 ^ ^/2n\, . /, 2(n- k)(j - 1W\ 

-5, m = > Co > , (n. - k) sin (n - fc)t + — — — , (6) 

dt A> (2n + 1) ( 2 ™) j-[ f^ \ k J ' V ' 2n+l ^ { > 

where t £ [— tt,it]. If we view C(t) as the trajectory of a particle, H(t) intuitively gives the velocity of the 
particle at point t. 

We can now use simple trigonometry to determine that 

6 j = arctan ( §4^4) ■ ( 7 ) 



H x {q ) 

Note that Rj is ultimately just a function of {cj S R 2 , j — 1, . . . , J}. 

Next, we show how to alter the scale of deformation, using an important concept called degree elevation. 

Definition Given any Roth curve, we can use degree elevation to re-express the same curve using a larger 
number of control points (a higher degree). More precisely, if we are given a curve of degree n, C(t) = 
X) 2 ^ 1 Cji?"(i), we can elevate its degree by any positive integer v, to obtain a new degree elevated curve: 
d(t) = Y,f=i' V)+1 Cj B j +V (t) such that C(t) = C{t) for all t G [-7r,7r]. In d(t), each new degree-elevated 
control point, Cj, can be defined in terms of the original control points, {ci, i = 1, . . . , In + 1}: 

? . 1 y c , gg^K y (?) ^ C0 J (n k) ( -W-V* \ | 2(n-fc)(i-l), 
J -~2n + l^ * + 2 2 "" 1 L/2(»+»))L COS ^ ^ \,2(n + u) + V 2n + 1 

■i — 1 — V v ~\~ k / i — 1 

It is crucial to note that the 'influence' of a single control point shrinks after degree elevation. We 



quantify this intuition in £3.3 This is because the curve is now shared by a greater total number of control 
points. This implies that after degree-elevation, the translation of any single control point will cause a 
smaller, finer-scale deformation to the curve's shape. Thus, degree elevation can be used to adjust the scale 
of deformation. We exploit this strategy in the random shape process proposed in §2.5| 

To that end, we first rewrite all of the concepts described above in more compact vector notation. Note 
that the formulas for degree elevation, deformation, the hodograph and the curve itself all simply involve 
linear operations on the control points. 

2.4 Vector notation 

First, we rewrite the control points in a 'stacked' vector of length 2 J. 

c=( Cl )'■ (8) 
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The formula for a Roth curve given in ([lj can be rewritten as: 

C(t) = X(t)c (9) 

Bi(t) B 2 (t) ••• Bj(t) 
X{t) = w w v ; (10) 

B x {t) B 2 (t) ••• Bj(t)_ 
The formula for the hodograph given in ^ is rewritten as: 

H{t)=x\t)c, X{t) = jX{t) (11) 

Deformation can be written as: 

c = c + T(c)d, d = (di iX ,di, y ,d 2 ,x,d2,y, ■ ■ ■ ,dj, x ,d JtV )' , T(c) = block(i?i,i? 2 , • • • ,Rj) (12) 

where block(A 1 , . . . , A q ) is a pq x pq block diagonal matrix using p x p matrices A i7 i = 1, . . . , q. We call T 
the stacked deformation-orientating matrix. Note that T is a function of c, because each Rj depends on c. 
Degree elevation can be written as the linear operator, E: 

c = Ec, E=(E l ,)^ 1 %v 

where 

E-. 1 i ^^- V (?) cos ( (n k) ( -2(i-Dn \ 2(n-k)U-l)n 

We will maintain this vector notation throughout the rest of the paper. 
2.5 Random Shape Process 

The random shape process starts with some initial Roth curve, specified by an initial set of control points, 
c (o) _ p rom here on, we will refer to all curves by the stacked vector of their control points, c. Then, drawing on 
the deformation and degree-elevation operations defined earlier, we repeatedly apply the following recursive 
operation R times: 

a**- 1 ) = £ r c (r-1) , d {r) ~ N( Mr , E r ), c (r) = c (r " 1) + T r (c( r -V)d( r) (13) 

resulting in a final curve c^. In other words, the process simply has two steps: (i) degree elevate the current 
curve, (ii) randomly deform it, and repeat a total of R times. Note that this random process specifies a 
probability distribution over c^ R \ 

We now elaborate on the details of this recursive process. The parameters of the process are: 

1. R € Z, the number of steps in the process 
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Figure 3: An illustration of the shape generation process. From left to right: 1) initial curve 
specified by three control points, 2) the same curve after degree elevation, 3) deformation, 4) degree 
elevation again, 5) deformation again. Dark lines indicate the curve, pale dots indicate the curve's 
control points, and pale lines connect the control points in order. 

2. n r € Z, the degree of the curve c^ r \ for each r = 1, . . . , R. The sequence of {n r }^ must be strictly 
monotonically increasing. For convenience, we will denote the number of control points at a certain 
step r to be J r = 2n r + 1. 

3. /Lt r <G R 2Jr , the average set of deformations applied at step r = 1, . . . , R. Note that this vector contains 
a stack of deformations, not just one. 

4. e jj2./,.x2j r ^ foe covariance in the set of deformations applied at step r = 1,...,R. 

According to these parameters, E r is then the degree-elevation matrix going from degree n r _ito n r , N(-, •) 
is a 2 J r -variate normal distribution and T r is the stacked deformation orienting matrix. 

We take special care in denning the initial curve, c^K We choose to be degree uq — 1, which 
guarantees that it is an ellipse. For j — 1, 2, 3, we define each control point as: 

cf = (0,0)' + R ej df\ 

2nj 

Re j — rotation matrix where 6j = — - , 

3 

and where each 6 R 2 is a random deformation vector. In words: we start with a curve that is just 
a point at the origin, C(t) = (0,0), and apply three random deformations which are rotated by a radially 
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symmetric amount: 0°,120° and 240° (note that the final deformations are not radially symmetric, since 
each dj is randomly drawn). We will write this in vector notation as: 

d<°> ~ N(W),E ) 
c<°> = + T d (0) 

The deformations essentially 'inflate' the curve into some ellipse. This completes our definition of the random 
shape process. 

We now give some intuition about the process and each of its parameters, and define several additional 
concepts which make the process easier to interpret. The random shape process gives a multiscale repre- 
sentation of shapes, because each step in the process produces increasingly fine-scale deformations, through 
degree-elevation . 

R is then the number of scales or 'resolutions' captured by the process. Each n r specifies the number of 
control points at resolution r. We will use S r to denote the class of shapes that can be exactly represented 
by a degree n r Roth curve. See the definition of W n " r in f|3] for a formal characterization of a special case of 
S r . If {n r }f is monotonically increasing, then Si C §2 C • ■ ■ C Sr. Thus, the deformations d^ roughly 
describe the additional details gained going from § r _i to S r . 

jj, r is the mean deformation at level r. Based on {/z r , r = 0, . . . , R}, we define the 'central shape' of the 
random shape process, c* as: 

c* := c*W 

Note that c* is simply the deterministic result of the random shape process when each d^ — [i r , rather 
than being drawn from a distribution centered on fi r . Thus, all shapes generated by the process tend to be 
deformed versions of the central shape. We illustrate this in Figure [4] If the random shape process is used 
to describe a population of shapes, the central shape provides a good summary. 

S r determines the covariance of the deformations at level r. This naturally controls the variability among 
shapes generated by the process. If the variance is very small, all shapes will be very similar to the central 
shape. S r can also be chosen to induce correlation between deformation vectors at the same resolution, in 
the typical way that correlation is induced between dimensions of a multivariate normal distribution. This 
allows us to incorporate higher-level assumptions about shape, such as reflected or radial symmetry. For 

(2) (2) 

example, if R = 2,ni = 1 and n<x = 2, we can specify perfect correlation in S2, such that d\ — d\ and 

(2) (2) 

d 2 — d 3 . The resulting shapes arc guaranteed to be symmetrical along an axis of reflection. 

In the subsequent sections [4] and [5j we show how to use our random shape process to guide curve-fitting 
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Figure 4: Realizations from the random shape process. The left panel shows realizations (in blue) 
when the central shape is a moon (shown in red). The right panel shows a similar case for a star. 

for various types of data. When doing so, we would like each resolution r to describe the shape as best as 
possible, within its class S r . This can be achieved by setting fi r = for r = 1, . . . , R (not including /iq). 

3. PROPERTIES OF THE PRIOR 

3.1 General notations 

The supremum and Li-norm are denoted by || • Hoc and || • ||i, respectively. We let || • \\ pv denote the norm of 
L p (v), the space of measurable functions with ^-integrable pth absolute power. The notation C(X) is used 
for the space of continuous functions / : X M endowed with the uniform norm. For a > , we let C a (X) 
denote the Holder space of order o, consisting of the functions / G C(X) that have [a\ continuous derivatives 
with the [ajth derivative /L Q J being Lipshitz continuous of order a— [a\ . We write for inequality up to a 
constant multiple and {a(i), «(2)i • ■ • j a (n)} to denote the order statistics of the set {a^ : a, G R, i = 1, . . . , n}. 

3.2 Support 

Let the Holder class of periodic functions on [— it, tt] of order a be denoted by C a ([— tt, it]). Define the class 
of closed parametric curves Sc{ai, a 2 ) having different smoothness along different coordinates as 

S e (a u a 2 ) ~ {S = {S\S 2 ) : [-tt.tt] -> K 2 ,S l € C7 Q -([-vr,^),i = 1,2}. (14) 

Consider for simplicity a single resolution Roth curve with control points {cj,j = 0, ...,2n}. Assume 
we have independent Gaussian priors on each of the two coordinates of Cj for j = 0, . . . , 2n, i.e., C{t) = 
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Ej"o c i B "W> Cj ~ N 2(0, o£r 2 ), i = 0, ■ • . , 2n. Denote the prior for C by Tic™ ■ He* defines an independent 
Gaussian process for each of the components of C . Technically speaking, the support of a prior is defined 
as the smallest closed set with probability one. Intuitively, the support characterizes the variety of prior 
realizations along with those which are in their limit. We construct a prior distribution to have large 
support so that the prior realizations are flexible enough to approximate the true underlying target object. 
As reviewed in van der Vaart and van Zanten (2008), the support of a Gaussian process (in our case lie™) 



is the closure of the corresponding reproducing kernel Hilbcrt space (RKHS). The following Lemma 3.1 
describes the RKHS of He™, which is a special case of Lemma 2 in Pati and Dunson (2011). 

Lemma 3.1 The RKHS H™ of Tic™ consists of all functions h : [— tt, it] — > R 2 of the form 

2n 

h(t) = J2cjBJ(t) (15) 

3=0 

where the weights Cj range over R 2 . The RKHS norm is given by 

2n 

\\h\\ 2 m =^2\\c j \\ 2 M- (16) 

The following theorem describes how well an arbitrary closed parametric surface Sq G Sc (0:1,02) can be 
approximated by the elements of H™ for each n. Refer to Appendix [A] for a proof. 

Theorem 3.2 For any fixed Sq £ Sc(ai,a2), there exists h 6 H" with ||/i||jjj re < K\ Y^j=o V* 7 ! such that 

||Sb-ft|U <K 2 n- a w\ogn (17) 
for some constants K\,K% > independent of n. 

This shows that the Roth basis expansion is sufficiently flexible to approximate any closed curve arbitrarily 
well. Although we have only shown large support of the prior under independent Gaussian priors on the 
control points, the multiscale structure should be even more flexible and hence rich enough to characterize 
any closed curve. We can also expect minimax optimal posterior contraction rates using the prior Uc n 
similar to Theorem 2 in Pati and Dunson (2011) for suitable choices of prior distributions on n. 

3.3 Influence of the control points 

The unique maximum of basis function B™(t) defined in (JTJ) is at t = —2iv(j — l)/J, therefore the control point 
Cj has the most significant effect on the shape of the curve in the neighborhood of the point C{— 27r(j — 1)/ J) . 
Note that B"(t) vanishes at t = tt — 2n(j — 1)/J, thus Cj has no effect on the corresponding point i.e., the 
point of the curve is invariant under the modication of cj . The control point Cj affects all other points of the 
curve, i.e. the curve is globally controlled. These properties are illustrated in Figure [5j 
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Figure 5: Influence of the control point on the Roth curve 



However, we emphasize following Proposition 5 in Roth et al. (2009) that while control points have a 
global effect on the shape, this inuence tends to be local and dramatically decreases on further parts of the 
curve, especially for higher values of n. 



4. INFERENCE FROM POINT CLOUD DATA 

We now demonstrate how our multiscale closed curve process can be used as a prior distribution for fitting 
a shape to a 2D point cloud. As a byproduct of fitting, we also obtain an intuitive description of the shape 
in terms of deformation vectors. 

Assume that the data consist of points {pi € 5R 2 ,i = 1, . . . , N} concentrated near a 2D closed curve. 
Since a Roth curve can be thought of as a function expressing the trajectory of a particle over time, we view 
each data point, pi, as a noisy observation of the particle's location at a given time ti, 

p i = C(t i ) + e h e^N 2 {0,a 2 I 2 ) (18) 



( 18 ) shares a similar form to nonlinear factor models, where ti is the latent factor score. We start by 
specifying the likelihood and prior distributions conditionally on the £jS. We now rewrite the point cloud 
model in stacked vector notation. Defining 

P = (Pl,x,Pl,y, ■ ■ ■ ,PN,x,PN,y)' , £ = ^l,yi ■ ■ ■ > e N,x, ^N,y) 

t = (tt,x>ti,y, • • ■ , ijv, x > *iv,j/) i X(t) = [X(ti)'X(t 2 y ■ ■ ■ X(t N )'] 

we have 

p = X{t)c + e } e~ N 2N {0,<j 2 I 2N ) (19) 



where X(ti) is as defined in (11 1 
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To fit a Roth curve through the data, we want to infer P(c \ p), the posterior distribution over control 
points c, given the data points p. To compute this, we must specify P(p | c), the likelihood, and -P(c), the 



prior distribution over Roth curves specified by c. Refer to (13) in S2.5 for a multiscale prior P(c). From 



(19), we can specify the likelihood function as, 

N . J v 

P{{Pi\i I {ci}{) = J] N 2 {p l -Y,CjB ] {t l ) 1 a 2 I 2 y P{p | c) = N 2N {p;X(t) Cl a 2 I 2N ), in vector not^O) 
i=i ^ j=i ' 

This completes the Bayesian formulation for inferring c, given p and t. In <|7j we describe the exact method 
for performing Bayesian inference. As a byproduct of inference, we also infer the deformation vectors 
for , r = 1, . . . , R. Due to their multiscale organization, they may describe shape in a more intuitive manner 
than {cj,j = 1, . . . , J}. 

We propose a prior for ti conditionally on c, which is designed to be uniform over the curve's arc-length. 
This prior is motivated by the frequentist literature on arc- length parameterizations (Madi 2004) , but instead 
of replacing the points {pi € 5i 2 } with {ti € [— 7r,7r]} in a deterministic preliminary step prior to statistical 
analysis, we use a Bayesian approach to formally accommodate uncertainty in parameterization of the points. 
Define the arc-length function A : [—%, n] h-> M + 

A(u):=A(u;(c ,...,c 2n ))= j \\H(t)\\dt. (21) 

J — 7T 

Note that A is monotonically increasing and satisfies A(—tt) = 0, A(tt) = L(co, . . . , c 2n ) where L(cq, . . . , c 2n ) 
is the length of the curve conditional on the control points (co, . . . , c 2n ) and is given by J_ ||i?(t)||dt. 

Given (cq, . . . ,c 2n ), we draw ^ ~ Unif(0, L(cq, . . . ,c 2n )) and set ti = A^iU). Thus we obtain a prior 
for the tj's which is uniform along the length of the curve. We will discuss a novel griddy Gibbs algorithm 
for implementing the arc-length parametrization in a fully Bayesian framework in Sj7| 

5. INFERENCES FROM PIXELATED IMAGE DATA 

In this section, we define a hierarchical Bayesian model for point cloud data concentrated near a 2d closed 
curve. We also show how image data gives a bonus estimate for the object's surface orientation, at each 
point pi. We incorporate this extra information into our model to obtain an even better shape fit, with 
essentially no sacrifice in computational efficiency. 

A grayscale image can be treated as a function Z : K 2 — ► K. The gradient of this function, V ' Z : K 2 — ► K 2 
is a vector field, where VZ(x,y) is a vector pointing in the direction of steepest ascent. In computer vision, 
it is well known that the gradient norm of the image, ||VZ||2 : K 2 — >■ K approximates a 'line-drawing' of all 
the high-contrast edges in the image. Our goal is to fit the edges in the image with our shape model. 

In practice, an image is discretized into pixels {z a b \ a = 1, . . . , X, b = 1, . . . , Y} but a discrete version 
of the gradient can still be computed by taking the difference between neighboring pixels, such that one 
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gradient vector, g a j, is computed at each pixel. The image's gradient norm is then just another image, where 
each pixel m a ^ = \\g a ,b\\2- 

Finally, we extract a point cloud: {(a, 6) | rn a ,& > M, a = 1, . . . , X, b = 1, . . . , Y} where M is some 
user-specified threshold. Each point (a, b) can still be matched to a gradient vector g a ^. For convenience, we 
will re-index them as pi and gi. The gradient vector points in the direction of steepest change in contrast, i.e. 
it points across the edge of the object, approximating the object's surface normal. The surface orientation 
is then just Wj = arctan(^ jJ£ ). 

In the following, we describe a model relating a Roth curve to each Wj. This model can be used together 
with the model we specified earlier for the p,. 

5.1 Modeling surface orientation 

Denote by Vi = (H x (ti),H y (ti)) £ K 2 the velocity vector of the curve C(i) at the parameterization location 
t%,i — 1,...,N. Note that Vi is always tangent to the curve. Since each tOj points roughly normal to the 
curve, we can rotate all of them by 90 degrees, 9i = w, + ^, and treat each 9 t as a noisy estimate of Uj's 
orientation. Note that we cannot rotate the vector gi by 90 degrees and directly treat it as a noisy observation 
of Vi. In particular, gi 's magnitude bears no relationship to the magnitude of Vf. \\gi\\ is the rate of change 
in image brightness when crossing the edge of the shape, while describes the speed at which the curve 
passes through pi. 

Suppose we did have some noisy observation of denoted u,. Then, we could have specified the following 
linear model relating the curve {cj,j = 1, . . . , J} to the itj's: 

Ui = m + Si (22) 

= Y< c iJt Bjijti) + 5i (23) 
j'=i 

for % = 1, . . . , N where Si ~ N 2 (0, t 2 I 2 ). Instead, we only know the angle of itj, 6i. In sJTJ we show that using 
this model, we can still write the likelihood for by marginalizing out the unknown magnitude of Ui. The 
resulting likelihood still results in conditional conjugacy of the control points. 

6. FITTING A POPULATION OF SHAPES 

We can easily generalize the methodology above to fit a collection of K separate point clouds, and characterize 
the resulting population of shapes, represented by a closed curve. Continuing the vector notation earlier, 
we will represent the k th point cloud in a stacked vector p k , the corresponding parametrizations t k , surface 
orientations & k , and the control points corresponding to that point cloud as c k . Finally, we will denote the 
deformations which produce the curve for shape k as S r ^ ,k for each step r in the shape process. 
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Up to this point, the parameters specifying each closed curve are separate and independent. This is 
sufficient if we just wish to fit each point cloud independently. However, now we aim to characterize all the 
curves as a single population. To do so, we treat each curve as an observation generated from a single random 
shape process. We borrow information across the population of curves through sharing hyperparameters of 
our multiscale deformation model or by shrinking to a common value by assigning a hyperprior. These 
inferred hyperparameters and the uncertainty in estimating them will effectively characterize the whole 
population of shapes. This is a hierarchical modeling strategy that is often used to characterize a population. 

The hyperparameters of our random shape process are /i r and S r for r — 1, . . . , R. We can treat each 
fj, r as an unknown and place the following prior on it: 

Mr ~ N 2 j r {n tlr ,^ r ) (24) 

By assuming that all shapes are generated from a single shape process with unknown /x r 's, we are basically 



assuming that all shapes are deformed versions of one 'central shape' (defined earlier in 5 2.5) where the 
variability in deformation at scale r is £ r . 

Now suppose that each shape is rotated to a different angle. In this case, it may not be ideal to share 
all the deformations, because this would assume that all shapes are at exactly the same angle. One solution 
is to simply make the Ei very large, allowing large variation in the d^' k . These deformations define the 
coarsest outline of each shape, which may be an ellipse. If these have wide variance, each coarse outline 
may be rotated to a different angle. Furthermore, since all subsequent deformations are defined relative to 
this initial coarse outline, different shapes can share the exact same deformations even if they are rotated to 
different angles. 

Note that with this modification, all rotated shapes have essentially been aligned with each other, because 
all shape details expressed by the deformations for r > 1 have been matched up. 



7. POSTERIOR COMPUTATION 
7.1 An approximation to the deformation-orienting matrix for the deformation vector 
Observe that since T r (c^' fe ) may not be linear in c^ ,k , due to the arctan in Q, the full conditional 
distribution of c^~'' k is not conditionally conjugate. Below, we develop a novel approximation to the rotation 
matrix T r of T(c < - r_1 ^' c ). The approximation ensures that T r (c' r-1 "*) is linear in the level r— 1 control points 
c \T-i),k w hj c h results in conditional conjugacy of c^' _1 - ), ' c . We resort to a Metropolis Hastings algorithm 
with an independent proposal suggested by the approximation to correct for the approximation error. For 
j = 1, . . . , J r , let qj tr — ~2n(j — 1)/J r . Recall that X{qj^. r )c^' k is the hodograph evaluated at qj >r . 

Proposition 1 Rj in can be approximated by Rj,j — 1, . . . , J r , where Rj are approximate rotation 
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matrices given by 

(25) 



2tt 

— x 
L A 



_X y (q j<r )c^-^ k X x {q jir )c( r -V> k _ 
and La is an approximation for the length of the curve A(n; c( r_1 '' fc ) formed by the control points c( r-1 ) ,fc . 



Proof Refer to the definition of Rj in (j4J). First we derive an approximation for cos(#*) and sin(6**) where 
9* is the angle at the point q^ r of the curve specified by c^'~ 1 ^' k . We write 6* in vector notation 

9* = arctan ( * vi ?' r ]^* ) ■ (26) 



Using the identities 



we obtain, 



X x (q j!r )e(r-V, k 



cos arctan(a;/?/) = — , sinarctan(a;/i/) = — ' (27) 

\J x 2 + y 2 y/ x 2 + y 2 

cos(0*) = *A%,t)C 

yj{X x {q^)6 r -^ k ? + (X y (q j>r )c(r-V, k )2 

X (a- )A r ~^' k 
yf(X x ( qj , r )c(r-l).») 2 + (Xyiq^-^y 



The magnitude y (X x (qj yr )c( r 1 )' k ) 2 + (X y ( q j^ r )c l ' r ~ 1 ^ k ) 2 of the velocity vector at the point q - >r can be 
well approximated by the quantity I j n view of the uniform arc-length parameterization discussed 



in §??. Hence 



COS (0*) « — YJn, Ar(r-l),k 



2tt 



Sin( ^ } " A(Tr;c<.r-V> k ) Xy{qj < r)d 



(r—l),k 



Plugging in in a fixed approximation La for the length of the curve A(n; c^ 1 - ),fc ), we obtain the required 
result. □ 

7.2 Conditional posteriors for m and 

Before deriving the conditional posteriors, we first introduce some simplifying notation. Recall from §2.5| 
that 



=W = E r c( p - 1 )+T r (s rC ( , - 1 ))d( r 5 
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Using the approximation f r c {r ~^ « T r {E r c^-^)d^\ we then have cM w fj5 r + T r ) c^ 1 ). In the new 



arrangement, we can now cleanly write c^ R ' in terms of the base case, 







nlL„(^r + T r ) ifa<6 
1 otherwise. 



Hence can be approximated by 



nf c <°>. 



Note that the terms in are the source of approximation error. Given this expression, we can easily 

write in terms of m and , 

We can also write c^- 1 in terms of c' 1 " -1 ) and c?^ r ', for any r = 1, . . . , R 

Note that as r approaches i?, involves fewer factors and the amount of approximation error decreases. 

We are now ready to derive the conditional posteriors for m k and (as in sJH] we are using a superscript 
k to denote variables for the k th shape). First, we claim that all posteriors can be written in the following 
form for generic l x\ l y' and l z\ 

P(x\-) cx N(y;Qx,X y ) N(x;z,S x ) (28) 



P(x | -) 
S 11 



(29) 



Note that each conditional posterior is simply a multivariate normal. We now prove that each posterior can 



be rearranged to match the form of (28 1 - (29 1. 



P(m k \-) cx N(p k ;X(t k )c^> k ,<j 2 I 2Nk ) N(m fc ;^ m ,£ m ) 

cx N(p k ;X(t k )nf' k (m k + T dW> k ),a 2 I 2m ) N(m fc ;/i m ,E m ) 

cx N (p k -X(t k )n?' k T dW> k ;X(t k )n?' k m k ,a 2 I 2N ^ N(m*; Mm , £ m ) 



P(d (r) | -) oc N(p fc ;X(t fc )c W ' fe ,cr 2 / 2Wfc ) N(d (r) ;^r,Sr) 



oc N 



(p fc ;X(t fc )fi* +1 



rf (r 



,a 2 /. 



N(d«;/i r ,E r ) 



N(p fc ~X(^)r!f +1 S r c ( '- 1) ;X(i fe )r!f +l r r (c^- 1 ') d< r \ a 2 / 2Jvfc ) N(d«;/x r 
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7.3 Conditional update for a 2 I^ Nk = T p x 



K N k 



P{t p \~) cx J]IjN(p l fc ;p^,r- 1 I 2 ) Ga(r p ;a,/3) 



fc=l i=l 



cx r p tot exp 



fc=l i=l 



cx r" +JVto *- 1 exp 



if JV fc 



- + £££llrf-rt*lla U 



fc=i j=i 



Where pf* = X(t* ; )c ( - i? ^' c and 7V tot = X)fe=i -^k- The conditional posterior distribution is then 



P(t p \-) ~ Ga(d,/3) 



where 



d = a + N tot - 1 d = d + 



K N k 

££ll^-pMli 

fc=i j=i 



7.4 Likelihood contribution from surface-normals 
Define 



X x (ti 

Xy(t t 



dt 



dt 



dt 



dt 



dB^(U 



■ ,0, 



dt '° 
dBZM 



dt 



(30) 
(31) 



Proposition 2 XTie likelihood contribution of the tangent directions 9 k ,i = l,...,N k ensures conjugate 
updates of the control points for a multivariate normal prior. 



Proof Recall the noisy tangent director vectors u k, s and vf's in (22). Use a simple reparameterization 

u\ = {m k ,m k tan 6 k ) 



where only O^s are observed and mds aren't. Observe that 

v k = (H x (U),H y (U)) = (X x (t k )c^ k ,X y (t k )c^ k ). 



(32) 



Assuming a non-informative prior for the mf's on K, the marginal likelihood of the tangent direction Q\ 



given r 2 and the parameterization t k is given by 



1 



2tTT 2 



exp 



~{(m? - X^c^) 2 + (mf tan(0,) - X,(if ) C ( 3 )> fe ) 2 } 



dm* 
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It turns out the above expression has a closed form given by 



V27 



2ttt 2 ^l + tan 2 (6f) 



exp 



'2t 2 P x[i) ' +[A y^ )C > l+tan 2 (^) 



The likelihood for the {6%, i = 1, . . . , N k } is given by 



L(9 k ,...,6 k Nk ) (x exp 

T 



JL^ ^ (X z (^)c( 3 )' fc ) 2 tan 2 (g a ) + {X y {t k i )c^ k f - 2X x (t k )cM^ k X y {t k )c ( - 1 ^ k tan(0 ( 



1 + tan 2 (6*,) 



exp 



A' 



2r 



where 



taii 2 (ef ) 



-tan(fl^) 



^ _ l+tan 2 (0f) l+tan 2 (ef; 
~~ -tan(fl, fc ) 1 



l+tan 2 (ef) l+tan 2 (9 i fc ) 

andT, = [(X x (t k ))' {X y (t k )'\ is a 2(2n 3 + l) x 2 matrix. Clearly, an inverse-Gamma for r 2 and a multivariate 
normal prior for the control points are conjugate choices. □ 

7.5 Griddy Gibbs updates for the parameterizations t% 

We discretize the possible values of t k G [— 7r, 7t] to obtain a discrete approximation of its conditional posterior: 

k N(pfrXfc)'c( 3 )-V 2 J 2 ) 

We can make this arbitrarily accurate, by making a finer summation over r. 



Proof of Theorem 



3.2 



8. SIMULATION STUDY 

9. CASE STUDY 

A. PROOFS OF MAIN RESULTS 
From (Stepanets 1974) and observing that the basis functions {BJ,j = 0, . . . , 2n} span 



the vector space of trigonometric polynomials of degree at most n, it follows that given any Sq G C ai ([— tt, tt]), 



there exists ti(u) = Y^l c)B^{u), ^ : [ _7r ' 7r ] ^ M with l c jl - M *' such that H ft< ~ ^olU < i^n _a< log7 
for some constants M i} Ki > 0,i = 1,2. Setting = X^=o( c j ' c j)'-^j l ( u )' we nave 

||A-SblL < Mn-^logn 



with I HI 2 ; < ifE^o & where M = M (2) ,K = K {2) . 
Proof of Lemma 13.11 
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